WinHelp File Format -- Additional Internal Files ------------------------------------------------ The September 1993 and October 1993 issues of Dr. Dobb's Journal contain (in the "Undocumented Corner") a detailed description by Pete Davis of the WinHelp file format, used in the Windows .HLP and .MVB files. Unfortunately, for space reasons only a limited number (though hopefully the most important) of the internal files that make up a .HLP file were discussed. All internal files are shown in WHSTRUCT.H and in HELPDUMP.C. Here are detailed descriptions of the internal files not discussed in the article. This probably won't make much sense if you haven't read the two-part DDJ article. |FONT ----- The |FONT file has three parts: a header, a list of available fonts, and a list of font descriptors. Following the file header for the |FONT file, is the FONTHEADER (see WHSTRUCT.H) record. This is a 4 word field. The first word is the number of fonts available to the help file. The second word is the number of font descriptors actually used in the help file. The third is the default font descriptor and the last is the offset to the descriptors list. Immediately following the FONTHEADER is a list of font names. These are all 20 character fixed length records. Each font name is null terminated so font names can be up to 19 characters long. Immediately following the fonts is the descriptor list. The font descriptors are individual instances of fonts that are actually used. For example, if you use 10 pt Helvetica, then a descriptor is created for that. If later you use 12 pt Helvetica, or Bold 10 pt Helvetica, different descriptors are created. Different descriptors are created for the following: 1) Using a different font 2) Using a different point size 3) Using a different attribute (Bold, Underline, Italics, etc) 4) Using a different color The first byte of the Font Descriptor is the attribute. This has attributes like Underline, Bold, etc. The second byte is the size of the font in half points. Therefore an 8 pt font has a halfpoint size of 0x10. The third byte is the family of the font. The fourth byte is the name of the font. This is the index into the font list preceding the font descriptor list. The last 6 bytes are the colors for foreground and background. Actually, the background color is just a guess. Changing these values has no affect on the font as WinHelp displays it. I'm guessing it was a planned enhancement. |CONTEXT Hash values The |CONTEXT file contains hash values for all the keywords and context strings. This makes it easy to search on keywords and context strings. Simply calculate the hash value of the string and search the |CONTEXT file for a matching hash value. Since the hash values can't be reversed, I have included a simple program called MAKEHASH.C. This will simply take a string from the command-line and convert it to a hash value. The hash algorithm uses a conversion table to remove case-sensitivity and reduce the number of characters involved in the hash. /******************************************** MakeHash.C Pete Davis Calculates and outputs the hash value of a string. These hash values are used in the |CONTEXT file of a WinHelp .HLP file. *********************************************/ #include char MapTable[256]; /* Function prototypes */ void BuildMap(void); long Hash (char *); /******************************************** Builds character set map for hash function. *********************************************/ void BuildMap() { char c; int counter; /* Map A-Z and a-z as 0-25. */ for (counter = 'A', c = 17; counter <= 'Z'; counter++, c++) MapTable[counter] = MapTable[counter + 32] = c; for (counter = '1', c = 1; counter <= '9'; counter++, c++) MapTable[counter] = c; MapTable['0'] = 0x0A; MapTable['.'] = 0x0C; MapTable['_'] = 0x0D; } /******************************************** Hash function by Ron Burk *********************************************/ long Hash (char *p) { long h = 0; while(*p) { char c = MapTable[*p++]; h = h * 0x2B + c; } return h; } void main(int argc, char *argv[]) { long HashVal; BuildMap(); HashVal = Hash(argv[1]); printf(" Hash value = %ld\n", HashVal); } |KWMAP, |KWBTREE, and |KWDATA These three files are used together to get the keywords and their offsets to topics. These are the default keyword files. The default letter associated with Keywords in WinHelp is 'K'. Using the MULTIKEY option in the .HPJ file, though, you can have multiple keyword files based on different letters. If, for example, if you use the MULTIKEY=V option, you will have |VWMAP, |VWBTREE, and |VWDATA files associated with the 'V' keywords. We're going to stick with the 'K' keywords in our discussion and these are the only ones handled by the HELPDUMP program. The other keyword files are handled in exactly the same was as the 'K' keywords, so everything here applies. The |KWMAP file is the simplest. It starts with a single long that gives the number of KWMAPREC records. This is followed by a list of KWMAPREC records (See WHSTRUCT.H). The FirstRec field is the first keyword to appear on the given leaf page. The PageNum field, therefore, is the page number associated with the keywords. For example, if you have 3 leaf pages in the |KWBTREE file, then there will be 3 KWMAPREC records in |KWMAP. If there are, say, 60 keywords on page 0, 45 keywords on page 1, and 52 keywords on page 2, then the three records in |KWMAP would look like the ones in Figure 1. ------------- Figure 1 ---------------- | Rec 1 | Rec 2 | Rec 3 ---------+---------------------------- FirstRec | 0 | 60 | 105 Page Num | 0 | 1 | 2 The |KWBTREE file has a list of all the keywords in the help file. Each keyword has a count and an offset associated with it. The count is the number of occurrences of the keyword in the help file. The offset is relative to the the beginning |KWDATA file. If you have a keyword with a count of 3 and an offset of 16 (decimal), then you would go to the 16th byte of the |KWDATA file. You would then read the next 3 longs. Each of these longs would be an offset to the location in the |TOPIC file to find the occurance of the keyword. The way WinHelp uses is this information is as follows. When you select the "Search" button, you are given a list of keywords. If you double-click on a keyword, the all the topics with occurances of that keyword are listed. The topic titles are actually pulled from the |TTLBTREE file which has the topics and offsets. You would simply match thes offset from |KWDATA with the offsets in |TTLBTREE to get the topic titles. |CONTEXT The |CONTEXT file is a b-tree like the |TTLBTREE and |KWBTREE. It uses 2k page sizes and the same structure for the header, index nodes, and leaf nodes. The |CONTEXT file's data is simply a list of CONTEXTRECs which consist of a hash value and a topic offset (see WHSTRUCT.H). The hash values are for context strings used in the help file. The context strings are basically the hot links in the text. I question the need for this table since each context string, in the text, has the hash value. It would have been easier to just have the Topic offset associated with the hot-links instead of the hash value. You don't even have to actually calculate the hash value because it is provided with the hot-link itself. |CTXOMAP In the .HPJ file you can set up context-sensitive points in your help file by adding a section titled [MAP]. Under the [MAP] section you list Topic Titles and assign unique IDs to each one. A sample [MAP] section could look like this: [MAP] TableOfContents 0x0010 Introduction 0x0020 Chapter1 0x0030 Chapter2 0x0040 ... Chapter10 0x0120 Glossary 0x0130 You can then use these numbers in the WinHelp API function to jump to a specific topic. This information is listed in the |CTXOMAP file. The first WORD of the file is the number of entries in the Context Map table. This is followed by the individual CTXOMAPREC records (See WHSTRUCT.H). The CTXOMAP records simply have the unique ID provided in the [MAP] section followed by the offset to the topic specified. -- Pete Davis, September 1993 CIS 71644,3570 WPJ BBS 703-503-3021